Abstract: For describing user preferences because of large scale terms and data patterns, it is a big challenge to guarantee the quality of discovered relevance features in text documents. Term-based approaches, most existing popular text mining and classification methods have adopted. However, they have all suffered from the problems of polysemy and synonymy. Yet, how to effectively use large scale patterns remains a hard problem in text mining, over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; This paper presents an innovative model for relevance feature discovery, to make a breakthrough in this challenging issue. As higher level features and deploys them over low-level features (terms), it discovers both positive and negative patterns in text documents. On their specificity and their distributions in patterns, it also classifies terms into categories and updates term weights.
Keywords: Text mining; text feature extraction; text classification;